Kriging

Introduction

What is Kriging?

Kriging is an intriguing model for spatial interpolation. We can estimate the number of crimes with a limited set of data points we have by using kriging. Kriging essentially models the spatial relationship between points and penalises points that are farther away from each other, giving less weight, as shown in the equation below.1

\hat Z(S_0) = \Sigma_{i = 1}^{N}\lambda_iZ(S_i) where

  • Z(S_i) = the measured value at the ith location

  • \lambda_i = weight for the measured value at the ith location

  • s_0 = the prediction location

  • N = the number of measured values

Unlike simpler methods, such as Inverse Distance Weighted Interpolation or Linear Regression, kriging interpolates based on the spatial distribution of empirical observations, that being the data points we have, instead of assuming a theoretical distribution. The interpolation ultimately results in a map of prediction surface.

Krigining analysis is broadly in three parts.

  1. Pre-process data
  2. Create variogram
  3. Make a prediction

Variogram

Variogram, also known as semi-variogram, is a diagram of semi-variance, which is a half of mean squared difference in the values of paired locations. At distance h between location i and location j, Semivariogram(distance_h) = 0.5 * average(value_i - value_j)^2). For each pair of points, semi-variance is plotted against distance between points. Hence, variogram shows the covariance between each pair of points.

After building variogram, distribution that best fits the variogram is selected, as shown below. While it is possible to specify a distribution of interest. it is also possible for a software to choose the best-fit model. Kriging then uses the fitted variogram values to make predictions at unsampled locations.

Pre-Process Data

Set Up

ASB (anti-social behaviour) has shown a positive auto-correlation. Therefore, we will continue working on ASB.

Code
#--Install / load packages
pacman::p_load(sp, sf, data.table, rio, here, leaflet, gstat, tidyverse, Metrics, scales, corrr, ggcorrplot, FactoMineR, factoextra, corrplot)

#--Import street-level asb data
asb <- import(here("3_output", "asb_with_nearest_distances.csv"))

#--Calculate count of crimes per location coordinate
asb_count <- asb |> 
    group_by(location.latitude, location.longitude) |>
    count() |> 
    ungroup() |> 
    inner_join(asb, by = c('location.latitude', 'location.longitude')) |> 
    distinct(location.latitude, location.longitude, .keep_all = TRUE) |>
    group_by(location.latitude, location.longitude) |>
    mutate(location_id = cur_group_id()) |>
    ungroup()

#--Rename columns 
names(asb_count)[grepl('longitude', names(asb_count))] <- 'x' 
names(asb_count)[grepl('latitude', names(asb_count))] <- 'y' 

#--Convert dataframe to sf object and reproject to OSGB36
asb_count_sf <- asb_count |> 
    st_as_sf(coords = c('x', 'y'), crs = 4326) 

#--Get coordinates
asb_count_sf <- asb_count_sf |>
    mutate(x = st_coordinates(asb_count_sf)[, 1],
           y = st_coordinates(asb_count_sf)[, 2])

#--Change the sf back to df
asb_count <- st_drop_geometry(asb_count_sf)

The very first step of kriging analysis is to pre-proecess the data into a count data by location pairs. At every unique coordinate of longitude and latitude (or x and y), the number of anti-social behaviour (ASB) was counted.

Split Data into Test & Train Sets

Code
#--Create random indices
total_rows <- nrow(asb_count)
sample_size <- round(total_rows * 0.75)

set.seed(1234) # for reproducibility
random_indices <- sample(1:total_rows, sample_size, replace = FALSE)

#--Create the test set using the random indices
train_asb <- asb_count |> filter(location_id %in% random_indices)

# Create the training set by excluding the indices used for the test set
test_asb <-  asb_count |> filter(!location_id %in% random_indices)

# Convert train and test sets to sp objects
coordinates(train_asb) <- c("x", "y")
proj4string(train_asb) <- CRS("+proj=longlat +datum=WGS84")
coordinates(test_asb) <- c("x", "y")
proj4string(test_asb) <- CRS("+proj=longlat +datum=WGS84")

The count of ASB data at each unique location was randomly split into test and train sets at a ratio of 75% and 25% in order to evaluate the kriging model.

Create Variogram

Code
#--Create variogram
vgm <- variogram(log(n) ~ x + y, train_asb, width=0.1)

#--Create the fitted curve line
fit <- fit.variogram(vgm, vgm(c("Gau", "Sph", "Mat", "Exp")), fit.kappa = TRUE)

#--Plot the curve function
plot(vgm, main = "Variogram")

Code
plot(fit, main = "Variogram Model Fit", cutoff = max(vgm$dist))

Code
 # When the curve plateaus, the point pairs are no longer spatially correlated

Create Grid

We need to create a surface onto which kriging model can make prediction. In other words, we need all possible points within the boundary of Barnet, which we can achieve by creating very small grid cells of equal size and extract their centroid points.

Code
#--Import Barnet shapefile
bnt_shp <- sf::st_read(here("1_data", "9_geo", "bnt_lad.json"), crs = 4326) |> 
    st_make_valid() |>
    st_transform(27700) # reprojects onto British National Grid, of which the unit is meter
#> Reading layer `OS - BoundaryLine - 2022Authorities - Barnet' from data source 
#>   `C:\Users\Hannah.Chang\OneDrive - London Borough of Barnet\General - I&I Hub\02. Project Documentation\07. Standard Projects\ARC_LBB Website\arc_lbb_website\1_data\9_geo\bnt_lad.json' 
#>   using driver `TopoJSON'
#> Simple feature collection with 1 feature and 17 fields
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -0.3055738 ymin: 51.55528 xmax: -0.1291338 ymax: 51.67021
#> Geodetic CRS:  WGS 84

#--Make grid of 100m x 100m
grid <- bnt_shp |> 
    st_make_grid(cellsize = units::as_units(100, "m"), what = "centers") |>
    st_as_sf(crs = 27700)

#--Filter grid points to include only those within the Barnet polygon
result <- st_within(grid, bnt_shp) |>
    as.data.frame()

grid_bnt <- grid |> 
    mutate(row.id = 1:nrow(grid)) |> 
    left_join(result) |> 
    filter(!is.na(col.id))

#--Reproject to WGS84
grid_bnt_wgs84 <- grid_bnt |>
    st_transform(4326) |>
    rename(geometry = x)

#--Add longtidue and latitude 
grid_bnt_wgs84 <- grid_bnt_wgs84 |>
    mutate(x = st_coordinates(grid_bnt_wgs84)[, 1],
           y = st_coordinates(grid_bnt_wgs84)[, 2]) 

#--Check
ggplot() +
    geom_sf(data = grid_bnt, alpha = 0.3, colour = "#00AFA9") +
    geom_sf(data = bnt_shp, alpha = 0, lwd = 2, colour = "black") +
    ggtitle("All the points in the grids within Barnet boundary") +
    theme_minimal()

Perform Kriging

Code
#--Define components of kriging model
predictors <- c("x", "y")
formula_string <- paste("log(n)", "~", paste(predictors, collapse = " + "))
krige_formula <- as.formula(formula_string)

# Perform kriging over the grid
kriged_result_grid <- krige(
    formula = krige_formula,
    locations = train_asb,
    newdata = grid_bnt_wgs84,
    model = fit,
    maxdist = 10,
    nmax = 50
  ) 
#> [using universal kriging]



# Retrieve interpolated values for the grid
grid_fin <- grid_bnt_wgs84 |>
    select(-contains("id")) |>
    mutate(
        krige_pred = exp(kriged_result_grid$var1.pred),
        krige_var = exp(kriged_result_grid$var1.var)
    ) 

# Visualise on static map
ggplot() +
  geom_sf(data = grid_fin, aes(colour = krige_pred)) +
  scale_colour_gradient(low = "green", high = "red", name = "Predicted Value") +
   geom_sf(data = bnt_shp, alpha = 0, lwd = 1.5, colour = "black") +
  ggtitle("Hotspot Map for Anti-Social Behaviour Predicted by Kriging Model") +
  theme_minimal()

Code

# Visualise on interactive map
pal_asb <- leaflet::colorNumeric(palette = 'RdYlGn', grid_fin$krige_pred, rev = TRUE)

grid_fin$krige_popup <- paste0("Predicted Number of ASB: ", as.character(round(grid_fin$krige_pred, 0))) 

leaflet::leaflet(data = grid_fin) |>
    leaflet::addTiles() |> 
    leaflet::addPolygons(
        data = st_transform(bnt_shp, 4326),
        fillOpacity = 0,
        col = "black",
        opacity = 0.8) |>
    leaflet::addCircles(
    data = grid_fin,
    color = ~pal_asb(krige_pred),
    popup = ~krige_popup,
    radius = 10, # Adjust radius as needed
    stroke = TRUE,
    fillOpacity = 0.5
  )  |>
  addLegend('bottomright',
            pal =pal_asb,
            values = ~krige_pred,
            title = 'Predicted Count of ASB',
            opacity = 1)

Evaluate Kriging Model

Code
# Perform kriging over the test set
kriged_result_test <- krige(
    formula = krige_formula,
    locations = train_asb,
    newdata = test_asb,
    model = fit,
    maxdist = 10,
    nmax = 50
  ) |>
   st_as_sf()
#> [using universal kriging]

# Retrieve interpolated values for the test set
validation <- test_asb |>
    st_as_sf() |>
    select(n) |>
    mutate(
        krige_pred = exp(kriged_result_test$var1.pred),
        krige_var = exp(kriged_result_test$var1.var)
    )

# Calculate RMSE to evaluate model performance
rmse_asb <- Metrics::rmse(
    actual = validation$n,
    predicted = validation$krige_pred)

print(paste0("Root mean square error of the kriging model is ", round(rmse_asb,2 )))
#> [1] "Root mean square error of the kriging model is 11.94"

min_asb <- min(validation$krige_pred)
max_asb <- max(validation$krige_pred)

nrmse <- rmse_asb / (max_asb - min_asb)

print(paste0("Normalised root mean square error of the kriging model is ", round(nrmse, 2)))
#> [1] "Normalised root mean square error of the kriging model is 0.44"